-
Notifications
You must be signed in to change notification settings - Fork 68
[PLT-999] Vb/chunk by size plt 999 #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+429
−318
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8c70057
to
a7514ce
Compare
…ws_sync, remove unused MAX_DATAROW_PER_API_OPERATION
a7514ce
to
ac76e38
Compare
libs/labelbox/src/labelbox/schema/internal/descriptor_file_creator.py
Outdated
Show resolved
Hide resolved
libs/labelbox/src/labelbox/schema/internal/descriptor_file_creator.py
Outdated
Show resolved
Hide resolved
libs/labelbox/src/labelbox/schema/internal/descriptor_file_creator.py
Outdated
Show resolved
Hide resolved
libs/labelbox/src/labelbox/schema/internal/descriptor_file_creator.py
Outdated
Show resolved
Hide resolved
libs/labelbox/src/labelbox/schema/internal/descriptor_file_creator.py
Outdated
Show resolved
Hide resolved
libs/labelbox/src/labelbox/schema/internal/data_row_uploader.py
Outdated
Show resolved
Hide resolved
libs/labelbox/src/labelbox/schema/internal/descriptor_file_creator.py
Outdated
Show resolved
Hide resolved
2aa342b
to
6ea8239
Compare
return UploadManifest(source="SDK", | ||
item_count=len(specs), | ||
chunk_uris=chunk_uris) | ||
return UploadManifest(source="SDK", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this in this and previous review, but can this be an enum?
6ea8239
to
3bcdc23
Compare
sfendell-labelbox
previously approved these changes
Jun 4, 2024
3bcdc23
to
31ba730
Compare
adrian-chang
approved these changes
Jun 4, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR updates our chunking logic for Dataset
create_data_rows
andupsert_data_rows
to chunk by size of file, not the number of data rows. This will result in generating files of similar size, making processing more predictable and reducing a chance of error or a performance issue due to a very large fileWe have set the file size limit to 10MB based on the following considerations:
Fixes # (issue)
Type of change
Please delete options that are not relevant.
All Submissions
New Feature Submissions
Changes to Core Features